Efficient Algorithms for Pattern Matching with General Gaps and Character Classes

نویسندگان

  • Kimmo Fredriksson
  • Szymon Grabowski
چکیده

We develop efficient dynamic programming algorithms for a pattern matching with general gaps and character classes. We consider patterns of the form p0g(a0, b0)p1g(a1, b1) . . . pm−1, where pi ⊂ Σ, where Σ is some finite alphabet, and g(ai, bi) denotes a gap of length ai . . . bi between symbols pi and pi+1. The text symbol tj matches pi iff tj ∈ pi. Moreover, we require that if pi matches tj , then pi+1 should match one of the text symbols tj+ai+1 . . . tj+bi+1. Either or both of ai and bi can be negative. We give algorithms that have efficient average and worst case running times. The algorithms have important applications in music information retrieval and computational biology. We give experimental results showing that the algorithms work well in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching

The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] - x(2,3) - [DE] - x(2,3) - Y, where the brackets match any of the letters inside, and x(2,...

متن کامل

A Dynamically Reconfigurable FPGA-Based Pattern Matching Hardware for Subclasses of Regular Expressions

In this paper, we propose a novel architecture for largescale regular expression matching, called dynamically reconfigurable bitparallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of ex...

متن کامل

BLIM: A New Bit-Parallel Pattern Matching Algorithm Overcoming Computer Word Size Limitation

Bitwise operations are executed very fast in computer architecture. Algorithms aiming to benefit from this intrinsic property can be classified as bit-parallel algorithms. Bit-parallelism has been widely investigated in pattern matching area since the introduction of the Shift-Or algorithm. In the original idea, there were no shift mechanism, and the input pattern length is required to be less ...

متن کامل

Modulated string searching

In his 1987 paper entitled Generalized String Matching Abrahamson introduced the concept of pattern matching with character classes and provided the first efficient algorithm to solve this problem. The best known solution to date is due to Linhart and Shamir (2009). Another broad yet comparatively less intensively studied class of string matching problems is numerical string searching, such as ...

متن کامل

Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006